13  File input-output

Reading data from a text file

Data is frequently stored in tabular form in text files. The read.table() function can read a file from your disk and return a data frame containing that data.

help(read.table)

Suppose we have a data file mydata.txt with the following contents:

Can 1.70 65
Cem 1.75 66
Hande 1.62 61
Lale 1.76 64
Arda 1.78 63
Bilgin 1.77 84
Cem 1.69 75
Ozlem 1.75 65
Ali 1.73 75
Haluk 1.71 81

The file can be read into a data frame simply with:

hwdata <- read.table("mydata.txt")
hwdata
       V1   V2 V3
1     Can 1.70 65
2     Cem 1.75 66
3   Hande 1.62 61
4    Lale 1.76 64
5    Arda 1.78 63
6  Bilgin 1.77 84
7     Cem 1.69 75
8   Ozlem 1.75 65
9     Ali 1.73 75
10  Haluk 1.71 81
class(hwdata)
[1] "data.frame"

We can change the columns of the dataframe as usual:

names(hwdata) <- c("Name", "Height","Weight")
hwdata
     Name Height Weight
1     Can   1.70     65
2     Cem   1.75     66
3   Hande   1.62     61
4    Lale   1.76     64
5    Arda   1.78     63
6  Bilgin   1.77     84
7     Cem   1.69     75
8   Ozlem   1.75     65
9     Ali   1.73     75
10  Haluk   1.71     81

The function read.table() is quite versatile, and it has a lot of parameters to tune its behavior. The help documentation help(read.table) can be helpful.

Let’s read a new data file mydata2.txt. It has a header row, and we want to set the column names of the resulting data frame accordingly:

Name Height Weight
Can 1.70 65
Cem 1.75 66
Hande 1.62 61
Lale 1.76 64
Arda 1.78 63
Bilgin 1.77 84
Cem 1.69 75
Ozlem 1.75 65
Ali 1.73 75
Haluk 1.71 81

All we need is to set the header parameter to TRUE:

hwdata <- read.table("mydata2.txt",header = TRUE)
hwdata
     Name Height Weight
1     Can   1.70     65
2     Cem   1.75     66
3   Hande   1.62     61
4    Lale   1.76     64
5    Arda   1.78     63
6  Bilgin   1.77     84
7     Cem   1.69     75
8   Ozlem   1.75     65
9     Ali   1.73     75
10  Haluk   1.71     81

Now we have another file mydata3.txt whose fields are separated with commas, instead of spaces:

 Name,Height,Weight
 Can,1.70,65
 Cem,1.75,66
 Hande,1.62,61
 Lale,1.76,64
 Arda,1.78,63
 Bilgin,1.77,84
 Cem,1.69,75
 Ozlem,1.75,65
 Ali,1.73,75
 Haluk,1.71,81

To accomodate for that, we set the sep parameter to the separator character, comma.

hwdata <- read.table("mydata3.txt",header = TRUE, sep=",")
hwdata
     Name Height Weight
1     Can   1.70     65
2     Cem   1.75     66
3   Hande   1.62     61
4    Lale   1.76     64
5    Arda   1.78     63
6  Bilgin   1.77     84
7     Cem   1.69     75
8   Ozlem   1.75     65
9     Ali   1.73     75
10  Haluk   1.71     81

Now consider a more complicated data file mydata4.txt, which contains some comments added by the data collector.

 Name,Height,Weight
 Can,1.70,65
 Cem,1.75,66
 # Here is a comment
 Hande,1.62,61
 Lale,1.76,64
 Arda,1.78,63
 Bilgin,1.77,84 # another comment
 Cem,1.69,75
 Ozlem,1.75,65
 Ali,1.73,75
 Haluk,1.71,81

The comment character can be set with the comment.char parameter to read.table(). Then, everything on a line starting with # is ignored:

hwdata <- read.table("mydata4.txt",header = TRUE, sep=",", comment.char="#")
hwdata
     Name Height Weight
1     Can   1.70     65
2     Cem   1.75     66
3   Hande   1.62     61
4    Lale   1.76     64
5    Arda   1.78     63
6  Bilgin   1.77     84
7     Cem   1.69     75
8   Ozlem   1.75     65
9     Ali   1.73     75
10  Haluk   1.71     81

Actually, this was a redundant setting, because by default comment.char is already set to "#".

Sometimes the separator character can be used in a text field, such as the space character in the column for names. In such cases, we use quotes to delimit the column’s content, as below (mydata5.txt):

 Name Height Weight
 "Can Can" 1.70 65
 "Cem Cem" 1.75 66
 "Hande Hande" 1.62 61
 "Lale Lale" 1.76 64
 "Arda Arda" 1.78 63
 "Bilgin Bilgin" 1.77 84
 "Cem Cim" 1.69 75
 "Ozlem Ozlem" 1.75 65
 "Ali Ali" 1.73 75
 "Haluk Haluk" 1.71 81

The function read.table() recognizes the single- or double quotes by default.

hwdata <- read.table("mydata5.txt", header=TRUE)
hwdata
            Name Height Weight
1        Can Can   1.70     65
2        Cem Cem   1.75     66
3    Hande Hande   1.62     61
4      Lale Lale   1.76     64
5      Arda Arda   1.78     63
6  Bilgin Bilgin   1.77     84
7        Cem Cim   1.69     75
8    Ozlem Ozlem   1.75     65
9        Ali Ali   1.73     75
10   Haluk Haluk   1.71     81

Other quote characters can be specified using the quote parameter. For example, consider the data file mydata6.txt:

 Name Height Weight
 %Can Can% 1.70 65
 %Cem Cem% 1.75 66
 %Hande Hande% 1.62 61
 %Lale Lale% 1.76 64
 %Arda Arda% 1.78 63
 %Bilgin Bilgin% 1.77 84
 %Cem Cim% 1.69 75
 %Ozlem Ozlem% 1.75 65
 %Ali Ali% 1.73 75
 %Haluk Haluk% 1.71 81

Writing data to a file

Suppose that we process the data file by, e.g., adding some columns.

hwdata <- read.table("mydata6.txt", header=TRUE, quote="%")
hwdata$BMI <- hwdata$Weight / hwdata$Height^2
hwdata$BMI <- round(hwdata$BMI, 2)  # round to two decimal places
hwdata
            Name Height Weight   BMI
1        Can Can   1.70     65 22.49
2        Cem Cem   1.75     66 21.55
3    Hande Hande   1.62     61 23.24
4      Lale Lale   1.76     64 20.66
5      Arda Arda   1.78     63 19.88
6  Bilgin Bilgin   1.77     84 26.81
7        Cem Cim   1.69     75 26.26
8    Ozlem Ozlem   1.75     65 21.22
9        Ali Ali   1.73     75 25.06
10   Haluk Haluk   1.71     81 27.70

The function write.table() can be used to store a data frame in a file.

write.table(hwdata,"mydata7.txt")

This function writes the table together with the row names and column names:

"Name" "Height" "Weight" "BMI"
"1" "Can Can" 1.7 65 22.49
"2" "Cem Cem" 1.75 66 21.55
"3" "Hande Hande" 1.62 61 23.24
"4" "Lale Lale" 1.76 64 20.66
"5" "Arda Arda" 1.78 63 19.88
"6" "Bilgin Bilgin" 1.77 84 26.81
"7" "Cem Cim" 1.69 75 26.26
"8" "Ozlem Ozlem" 1.75 65 21.22
"9" "Ali Ali" 1.73 75 25.06
"10" "Haluk Haluk" 1.71 81 27.7

We can omit the row and column names with the following parameter settings.

write.table(hwdata,"mydata7.txt",row.names = FALSE, col.names = FALSE)